Detecting Topic Drift with Compound Topic Models
نویسندگان
چکیده
The Latent Dirichlet Allocation topic model of Blei, Ng, & Jordan (2003) is well-established as an effective approach to recovering meaningful topics of conversation from a set of documents. However, a useful analysis of user-generated content is concerned not only with the recovery of topics from a static data set, but with the evolution of topics over time. We employ a compound topic model (CTM) to track topics across two distinct data sets (i.e. past and present) and to visualize trends in topics over time; we evaluate several metrics for detecting a change in the distribution of topics within a time-window; and we illustrate how our approach discovers emerging conversation topics related to current events in real data sets.
منابع مشابه
“How Did We Get Here?”: Topic Drift in Online Health Discussions
BACKGROUND Patients increasingly use online health communities to exchange health information and peer support. During the progression of health discussions, a change of topic-topic drift-can occur. Topic drift is a frequent phenomenon linked to incoherence and frustration in online communities and other forms of computer-mediated communication. For sensitive topics, such as health, such drift ...
متن کاملDetecting Concept Drift in Data Stream Using Semi-Supervised Classification
Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...
متن کاملیک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کاملEntity Set Expansion using Interactive Topic Information
We propose a newmethod for entity set expansion that achieves highly accurate extraction by suppressing the effect of semantic drift; it requires a small amount of interactive information. We supplement interactive information to re-train the topic models (based on interactive Unigram Mixtures) not only the contextual information. Although the topic information extracted from an unsupervised co...
متن کاملConstructive Chaos: Topic Management in Asynchronous Learning Networks
Maintaining topic integrity in online discussions can be problematic for instructors. Understanding the underlying mechanisms of topic drift yields insight into how online classroom discussions can be effectively managed, enabling facilitators to assure conversational flow while limiting unproductive digressions. This paper presents an analysis of topic drift in asynchronous learning environmen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009